L1 Regression using Lewis Weights Preconditioning and Stochastic Gradient Descent

نویسندگان

  • David Durfee
  • Kevin A. Lai
  • Saurabh Sawlani
چکیده

We consider the l1 minimization problem minx ‖Ax − b‖1 in the overconstrained case, commonly known as the Least Absolute Deviations problem, where there are far more constraints than variables. More specifically, we have A ∈ R for n ≫ d. Many important combinatorial problems, such as minimum cut and shortest path, can be formulated as l1 regression problems [CMMP13]. We follow the general paradigm of preconditioning the matrix and solving the resulting problem with gradient descent techniques, and our primary insight will be that these methods are actually interdependent in the context of this problem. The key idea will be that preconditioning from [CP15] allows us to obtain an isotropic matrix with fewer rows and strong upper bounds on all row norms. We leverage these conditions to find a careful initialization, which we use along with smoothing reductions in [AH16] and the accelerated stochastic gradient descent algorithms in [All17] to achieve ǫ relative error in about nnz(A)+nd+ √ ndǫ time with high probability. Moreover, we can also assume n ≤ O(dǫ logn) from preconditioning. This improves over the previous best result using gradient descent for l1 regression [YCRM16], and is comparable to the best known running times for interior point methods [LS15]. Finally, we also show that if our original matrix A is approximately isotropic and the row norms are approximately equal, we can avoid using fast matrix multiplication and prove a running time of about nnz(A)+sdǫ+dǫ, where s is the maximum number of non-zeros in a row of A. Georgia Institute of Technology. email:[email protected] Georgia Institute of Technology. email:[email protected] Georgia Institute of Technology. email:[email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

`1 Regression using Lewis Weights Preconditioning

We consider the `1 minimization problem minx ‖Ax − b‖1 in the overconstrained case, commonly known as the Least Absolute Deviations problem, where there are far more constraints than variables. More specifically, we have A ∈ Rn×d for n d. Many important combinatorial problems, such as minimum cut and shortest path, can be formulated as `1 regression problems [CMMP13]. We follow the general para...

متن کامل

Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is becoming popular in natural language processing because of its ability to ...

متن کامل

Efficient Elastic Net Regularization for Sparse Linear Models

We extend previous work on efficiently training linear models by applying stochastic updates to non-zero features only, lazily bringing weights current as needed. To date, only the closed form updates for the l1, l∞, and the rarely used l2 norm have been described. We extend this work by showing the proper closed form updates for the popular l22 and elastic net regularized models. We show a dyn...

متن کامل

Generalization Error Bounds for Aggregation by Mirror Descent with Averaging

We consider the problem of constructing an aggregated estimator from a finite class of base functions which approximately minimizes a convex risk functional under the l1 constraint. For this purpose, we propose a stochastic procedure, the mirror descent, which performs gradient descent in the dual space. The generated estimates are additionally averaged in a recursive fashion with specific weig...

متن کامل

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017